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This  study  introduces  a  model,  utilizing  item  response  theory,  for 
dealing  with  various  rules  that  students  use  in  solving  problems. 

Siegler  (1976,  1978)  developed  a  rule  assessment  method  for  handling 
choice  data  —  he  used  this  method  in  the  context  of  cognitive 
development  in  the  balance  scale.  Anderson  (1974,  1981)  developed  a 
functional  measurement  methodology  to  research  the  assessment  of 
algebraic  integration  rules.  Wilkening  and  Anderson  (1982)  compared  the 
two  methods  (Siegler’s  binary  decision  tree  method  and  Anderson’s 
functional  method)  and  discussed  their  advantages  and  disadvantages. 
Wilkening  and  Anderson  state  that  the  binary  decision  tree  methodology 
does  not  resolve  the  underlying  problem  of  lack  of  an  error  theory  to 
handle  response  variability.  The  functional  measurement  method  allows 
for  unreliability  or  variability  in  the  responses  and  allows  analysis  of 
variance  to  assess  a  goodness  of  fit  measure  between  rules  and  data.  It 
seems,  however,  that  both  the  methods  are  more  suitable  for 
investigating  a  basic  foundation  of  knowledge  structure  and  development 
rather  than  for  conducting  evaluative  studies  on  performance  data. 

This  study  will  introduce  a  measurement  model  using  item  response 
theory  (IRT)  for  dealing  with  the  misconceptions  committed  by  many 
students.  Although  the  purpose  of  the  model  is  neither  to  discover  an 
unknown  source  of  misconception  from  responses  nor  to  represent 
knowledge  structure  like  the  binary  decision  tree  method,  it  has  the 
capability  of  diagnosing  many  erroneous  rules.  The  primary  purpose  of 
the  model  is  to  establish  an  interface  between  cognitive  processes  and 


psychometrics . 


It  is  useful  to  know  the  transitional  behavior  of  error  types  which 
may  be  due  to  a  change  of  instructional  methods,  advancement  of  learning 
stages,  or  the  stability  and  persistence  of  particular  misconceptions. 
Such  knowledge  can  help  to  evaluate  instruction,  measure  the  outcome  of 
learning  and  obtain  diagnostic  information  for  designing  remedial 
instruction  which  should  be  particular  to  the  type  of  misconception. 

The  model  should  be  able  to  express  various  aspects  of  misconceptions 
quantitatively  so  that  they  can  be  statistically  related  to  other 
measures  like  motivation  or  creativity. 

First  the  model,  which  is  named  "rule  space,”  will  be  introduced. 
The  rule  space  is  formulated  by  using  IRT  models  so  as  to  facilitate 
probabilistic  treatments  for  "behaviors"  of  misconceptions.  An  index 
measuring  "usualness"  of  responses  will  also  be  briefly  described 
because  it  is  used  as  one  of  the  coordinates  of  rule  space.  Secondly, 
rule  space  will  be  illustrated  with  signed-number  arithmetic  data  and 
the  responses  generated  from  various  erroneous  rules  will  be  shown  as 
points  in  the  rule  space.  Then  we  will  discuss  a  technique  for 
assessing  rules  used  inconsistently  by  a  student  due  to  "slips"  or  the 
instability  of  his/her  misconceptions.  Using  pattern  classification 
techniques  (Fukunaga,  1972)  to  determine  the  student’s  latent  state  of 
knowledge  (Tatsuoka  &  Tatsuoka,  1981)  or  to  find  his/her 
misconception(s)  seems  very  useful  when  taking  the  variability  of  errors 


into  account. 
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Rale  Space 

All  erroneous  rales  of  operation  In  signed-number  arithmetic  (that 
have  already  been  discovered)  can  be  expressed  as  points  in  a  geometric 
space  called  "rule  space."  In  other  words,  rule  space  is  a  geometric 
representation  of  the  rules  used  by  students.  Before  the  formulation  of 
the  rule  space  Is  presented,  the  extended  caution  index  (which  measures 
the  degree  of  anomaly  in  response  patterns)  will  be  briefly  introduced. 
Extend ed  Caution  Index  (ECI) 

A  group  of  extended  caution  indices,  which  provides  information 
from  patterns  of  responses  to  test  items  not  contained  in  the  total 
score,  was  introduced  by  Tatsuoka  and  Linn  (1981,  1983).  Similar 
indices  based  on  IRT  (Wright  &  Stone,  1977;  Levine  &  Rubin,  1979)  were 
introduced  as  identifiers  of  "guessing,  sleeping,  fumbling  and  plodding" 
(Wright  &  Stone,  1977,  p.  110)  or  "so  atypical ...  that  his  or  her 
aptitude  test  score  fails  to  be  a  completely  appropriate  measure" 

(Levine  &  Rubin,  1979,  p.  269).  Statistical  properties  of  the  ECIs  have 
been  investigated  by  Tatsuoka  and  Tatsuoka  (1982).  The  raw  ECIs  are 
standardized  (SECIs)  by  subtracting  their  conditional  expectations  then 
dividing  them  by  their  conditional  standard  error.  By  so  doing,  SECI 
provide  values  comparable  at  two  different  levels  of  person  parameters. 

The  values  of  the  ECIs  are  calculated  by  first  constructing  two 
matrices;  one  is  a  binary  score  matrix  (yij)>  i=l , • • • ,N , j=l , . . . ,n  where 
N  is  the  number  of  students  and  n  is  the  number  of  items  in  a  test.  The 
other  is  a  probability  matrix  with  elements  P^j,  which  values  of  a  lo-jist'c 
function  with  one,  two  or  three  parameters,  defined  as 
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Pij  =  cj  + 


1-c 


l+exp[-Daj(0i-bj) J 


wherd  Ls  the  guessing  parameter,  aj  is  the  item  discriminating 
power,  bj  is  the  item  difficulty,  and  9^  is  person  i’s  ability  or 
achievement  Level  (Lord  &  Novick,  1968;  Lord,  1980). 

In  practice,  the  estimated  obtained  by  substituting  a j ,  bj,  cj 

9^  by  their  estimated  item  and  person  parameters  in  the  logistic 
function  can  be  used.  One  of  the  ECIs,  ECI4,  is  defined  as  an  index 
reflecting  anomaly  of  an  actual  response  pattern  at  a  given  level  of 
ability  0jt  .  It  is  the  complement  of  the  ratio  of  two  covariances:  the 
numeritor  is  the  covariance  of  the  ith  row  vector,  y^,  of  (y^j)  and  the 
ith  row,  P^,  of  the  probability  matrix  (Pij);  the  denominator  is  the 
covariance  of  the  column-mean  vector  of, 

G  =  (G. i ,G. 2 » • • • >G.n) ,  and  the  ith  row  vector  P^,  both  of  (Pij). 

That  is, 


ECI4  =  1 


cov(2i  »  £0 

c°v(G  ,  Pt) 


1  N 

where  G  a  =  —  2  Pii 

•J  N  i-i  1J 

The  conditional  expectation  and  variance  of  ECI4  are  given  by 


VarLPa) 

E(ECl4|0i)  =  1  -  - — - 

C0V(G  ,  Pi) 


v'ar(ECl4  |9i) 


2Qij2(Pij  -  Tj)2 
n2cov2(G  ,  Pi) 

W  7  "J1 


and 


Thus,  the  standardized  EC14  is  given  by 


J 


l 


A 


ECI4Z  = 


rt«cov(P|  -  yi  ,  I’i) 


•  2>  i j 2 ( p i  *  Ti)2 
J=1 

L  n 

where  T;  =  —  2  P,-  the  raw-mean  vector  of  (P.-  ;)  and 


-'ij^  =  P f j C 1  ~  pij)»  variance  of  item  j  at  the  level  i. 

Tatsuoka  and  Tatsuoka  (1982)  showed  empirically  that  tne 
standardized  EC14  (SECI4)  has  an  appropriate  normal  distribution.  This  is 
not  surprising  because  ECI4  is  a  weighted  arithmetic  mean  of  Pm,  j  =  l  , 
2,...,n,  while  the  appropriateness  measures  developed  by  Levine  and 
Rubin  (  1979)  and  Drasgow  (  1982),  correspond  to  a  geometric  mean  of  Pm. 

Both  the  extreme  tails  of  the  distribution  correspond  to  more  unusual 
response  patterns  while  the  points  in  the  middle  indicate  the  usual, 
typical  response  patterns.  Harniscli  and  Tatsuoka  (1983)  examined 
empirically  the  relationship  between  SECIs  and  total  scores,  finding 
that  SEC1  correlates  nearly  zero  with  the  total  scores,  both  linearly 
and  curvilinearly . 

Component  scoring:  decomposing  the  regular  scoring  procedure  of 
"right"  or  "wrong"  into  finer  components 

Many  erroneous  rules  in  arithmetic  can  produce  the  right  answer  for 
a  given  item  (Van  Lehn,  1982;  Birenbaum  &  Tatsuoka,  198;. i;  Tatsuoka  a 
Tatsuoka,  1982,  1983;  Davis  1930).  For  example,  Lae  item  -10  -  (-4)  can  have 
the  right  answer  by  the  following  three  erroneous  rules:  (1)  always 
subtracting  the  two  numbers  and  taking  the  sign  of  the  number  with  the 
larger  absolute  value;  (2)  changing  the  minus  operation  sign  to 
addition,  misunderstanding  the  parentheses  as  the  bars  of  absolute  value 
and  then  applying  the  right  rule  for  addition;  (3)  converting  the 
subtraction  to  an  addition  problem  by  changing  the  sign  of  the  second 
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number,  then  subtracting  the  smaller  absolute  value  from  the  larger 
absolute  value  and  taking  the  sign  of  the  first  number  to  the  answer. 
These  three  erroneous  rules,  which  are  committed  by  a  substantial  number 
of  seventh  graders  (Birenbaum  &  Tatsuoka,  1983),  produce  the  right 
answer  for  all  subtraction  problems  in  which  the  first  number  has  the 
larger  absolute  value.  But  if  we  give  a  second  item  4  -  (-16),  then 
rule  (1)  produces  -12  and  rules  (2)  and  (3)  yield  the  answers  of  +20  and 
+12,  respectively.  Therefore,  if  we  select  an  appropriate  set  of  items, 
each  rule  would  correspond  to  a  unique  set  of  responses  to  those  items. 

It  is  not  always  true,  however,  that  the  traditional  scoring  of  "right” 
or  "wrong"  for  responses  to  the  items  produces  a  unique  set  of  binary 
response  patterns  corresponding  to  each  rule. 

Tatsuoka  and  Baillie  (1982)  pointed  out  that  there  are  several 
erroneous  rules  whose  response  patterns  by  the  traditional  scoring 
procedure  are  identical  but  which  can  be  distinguished  by  decomposing  the 
unit  of  the  answer  into  finer  components.  Tatsuoka  and  Tatsuoka  (1981) 
listed  the  response  patterns  of  45  erroneous  rules  in  signed-number 
arithmetic  also  obtained  from  the  regular  scoring  procedure.  Some  of 
the  45  binary  patterns  of  16  items  are  identical  although  the 
descriptions  of  the  erroneous  rules  which  produced  these  identical 
patterns  are  not.  There  is  no  way  to  distinguish  two  such  different 
rules  just  by  looking  at  their  binary  response  patterns. 

However,  all  the  erroneous  rules  discovered  so  far  in  signed-number 
addition  and  subtraction  problems  can  be  expressed  uniquely  as  sets  of 
the  binary  response  patterns  resulting  from  the  component  scoring 
procedure  obtained  by  decomposing  the  regular  scoring  procedure  into 


finer  components  —  e.g.,  the  sign  part  of  the  answer  for  a  given  item 
and  the  absolute  value  part  of  the  answer  in  signed-number  problems,  or 


for  a  fraction  problem,  the  three  components  (whole  number,  numerator 
and  denominator)  of  the  answer.  The  regular  response  patterns  are 
elementwise  products  of  the  component  response  patterns.  Table  1 
describes  this  procedure  with  four  examples  of  signed-number 
subtraction. 

Insert  Table  1  about  here 

Hereafter  we  will  use  this  new  scoring  method,  the  component 
scoring  procedure,  in  this  study.  Even  though  the  rationale  of 
component  scoring  is  based  on  a  signed-number  study,  it  may  be 
generalized  to  other  domains  of  arithmetic  or  mathematics. 

Rule  space:  True  score  and  SECI4  for  component  response  patterns 

Remember  that  each  student’s  regular  response  pattern  obtained  by 

regular  scoring  is  decomposed  into  component  response  patterns  so  that 

his/her  responses  to  the  test  items  are  now  represented  by  component 

response  patterns.  The  Euclidean  space  determined  by  the  four 

variables,  Ta,  Ts,  SECl4a  and  SECl4f  will  be  called  "rule  space" 
lit  1 

hereafter.  For  example,  the  four  rules  of  Table  1  are  expressed  as  four 
points  in  the  rule  space. 

Lord  and  Novick  (1968)  and  Lord  (198U)  defined  test  characteristic 
function  (or  test  response  curve)  as  the  average  of  n  item  response 
curves  (or  item  characteristic  functions)  and  denoted  by  T(6). 

T(0)  =  1/n  2  PU6) 


That  is 


Dinars'  Response  Patterns  of  Three  Different  Scorings  Generated  by  Four  Rules 
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Suppose  aij>  ^lj»  j=l,2,...,n  be  item  parameters  of  the  logistic  model 
estimated  by  the  maximum  likelihood  procedure  obtained  from  absolute 


value  component  patterns  and  a2j,  b£ j , j=l , 2 , . . . ,n  be  item  parameters 
estimated  from  sign  components  patterns.  In  other  words,  two  binary 
data  matrices,  (yj.j)  and  y^j)  which  are  obtained  by  component  scoring 
procedures,  are  used  to  estimate  the  item  parameter  and  for  caiculating 
SECI'-  and  (or  T(9^)  of  the  two  components  for  each  subject  i.  Thus, 
each  subject’s  two  component  response  patterns  correspond  to  two  ordered 
pairs:  (Tf,  SECI4^)  for  the  absolute  value  and  (T^,  SECI4s  )  for 
sign.  Table  2  provides  these  ordered  pairs  for  the  four  rules  given 
in  Table  1. 

Insert  Table  2  about  here 

An  illustration  of  rule  space  with  signed-number  subtraction  problem  data 
A  40-item  f ree-response  test  that  comprises  four  parallel  subtests 
of  10  items  each  in  signed-number  subtraction  problems  was  administered 
to  172  eighth  graders  at  a  local  junior  high  school  (referred  to  as 
"Test  6”  hereafter;  more  tests  to  be  introduced  later).  The  traditional 
scoring  of  right  or  wrong  answers  was  decomposed  into  a  two-component 
scoring  procedure  for  the  absolute-value  and  sign  parts  of  the 
responses.  Thus,  the  signs  of  the  responses  to  the  40  items  were  scored 
right  or  wrong  and  so  were  the  absolute  values.  The  two  component- 
response  patterns  are  subjected  separately  to  the  estimation  of  item  and 
person  parameters  of  the  two-parameter  logistic  model.  The  item 
parameters  estimated  by  the  maximum  likelihood  procedure  are  listed  in 
Appendices  I  and  II.  Twenty  complete  erroneous  rules  that  are  often 
observed  (at  least  3  different  students  used  them)  are  selected  for  this 


Table  2 


The  Values  of  (T?,  SECI43)  and  (TS,  SECI4S),  the  Ordered 
11  11 

Pairs  of  True  Score  and  Standardized  Extended  Caution  Index 
for  the  Four  Rules  given  in  Table  1 


Rules  T? 

l 


SECIA3  TS 

1  1 


SECIA? 

1 


Rule 

16 

.2966 

-3. 1570 

.4488 

-3.2816 

Rule 

32 

.2966 

-3. 1570 

.4555 

2.4860 

Rule 

12 

.7522 

-2.6485 

.8791 

.2196 

study  (Tatsuoka  &  Tatsuoka,  1981)  and  their  component  values  of  the  rul 
space  are  calculated  and  listed  in  Appendi*  TT1. 

These  values  o^  T  * ,  ,  Slid 4  1  and  S i i C  L for  all  students  In  tile 

11  i  i 

dataset  as  well  as  those  for  the  twenty  erroneous  rules  (Tatsuoka  & 
Tatsuoka,  1981)  are  mapped  into  the  rule  space.  Figure  1  shows  a 

3.  S 

subspace  whose  coordinates  are  T^  and  T^.  As  has  been  mentioned 
earlier,  a  rule  space  is  defined  as  a  geometric  representation  of  the 
rules  (including  the  right  rule  and  inconsistent  application  of  two  or 
more  rules)  used  by  the  students. 

Insert  Figure  1  about  here 

Twenty  small  circles  (o)  in  Figure  1  represent  twenty  different 
erroneous  rules  while  the  plus  signs  (+)  stand  for  the  student’s 
responses  to  the  40  items.  If  the  student  responds  to  the  40  items  by 
applying  one  of  the  twenty  erroneous  "oj.es  consistently,  then  his/her 
point  should  coincide  with  the  circle  representing  the  rule.  There  are 
two  such  points  v®"  in  Figure  1.  Most  points  do  not  show  overlap,  but 
some  real  responses  are  located  in  the  vicinity  of  a  rule. 

Tatsuoka  and  Baillie  (1982)  generated  data  which  simulate  responses 
resulting  from  inconsistent  application  of  a  rule.  One  or  two  out 
of  the  40  items  do  not  follow  a  given  erroneous  rule  and  thus  the  componen 
response  patterns  do  not  completely  match  the  patterns  produced  by  the 
erroneous  rule.  Twenty  sets  of  simulation  data  based  on  the  twenty 
different  rules  were  generated  and  plotted  on  the  space  spanned  by  both 
the  component  true  scores  in  Figure  2.  Rules  16,  32,  12  and  46  and 
their  simulated  responses  which  cluster  around  corresponding  rules  are 
not  separated  well  from  each  other  in  Figure  2.  As  can  be  seen  in 


3 


Figure  1,  Rules  16  and  32,  and  12  and  46  are  already  very  close,  respectively 

Insert  Figure  2  about  here 

But,  when  plotted  in  terms  of  the  sign  true  score  against  the 
standardized  ECl4s  obtained  from  the  sign-component  scores,  four 
distinctly  different  clusters  are  formed  in  this  space  as  shown  in 
Figure  3a.  In  a  similar  figure.  Figure  3b,  the  absolute  vlue  true  score 
is  plotted  against  the  ECl4a,  showing  rules  12  and  46  distinctly  separated. 

Insert  Figures  3a  &  3b  about  here 
It  is  apparent  that  the  values  of  ECIs  are  capable  of  separating 
response  patterns  that  have  very  close  true  scores  or  the  same  total 
scores. 

Pattern  Classification 

In  the  previous  section.  Figure  3  showed  the  four  erroneous  rules 
(described  in  Table  1)  and  the  non-consistent  responses  neighboring  each 
of  them  forming  four  distinctly  different  clusters.  By  calculating  a 
linear  classification  functions  for  each  of  the  four  clusters  and 
setting  the  boundaries  to  divide  the  four  regions,  it  is  possible  to 
classify  the  misconception  underlying  a  new  response  by  examining  the 
region  in  which  the  new  response  falls  —  with  some  probability  of 
misclassif ication,  of  course.  This  is  the  traditional  procedure  for 
pattern  classification  and  recognition  problems  to  determine  the 
category  to  which  a  new  stimulus  belongs  (Fukunaga,  1972).  Thus,  we 
have  transformed  our  problem,  diagnosing  an  individual  students’ 
misconceptions,  into  a  classification  problem.  Tatsuoka  and  Baillie 
have  developed  a  computer  program  named  SIGNBUG  for  diagnosing  erroneous 


rules  in  signed-number  arithmetic  tests,  but  the  logic  of  the  algorithm 


is  deterministic;  therefore,  if  a  student  responds  to  in  item  without 
using  a  specific  rule,  then  SIGNBUG  cannot  determine  the  rule. 

As  shown  in  Figures  2  and  3,  the  component  response  patterns 

yielded  by  using  an  erroneous  rule  consistently  for  the  test  items  and 

the  responses  resulting  from  random  "slips"  of  one  or  two  items  iorm  a 

cluster;  this  is  a  nice  feature  of  the  rule  space.  An  error  theory  that 

can  handle  response  variability  becomes  applicable  to  our  model.  Since 

all  erroneous  rules  that  have  been  discovered  so  far  in  signed 

number  arithmetic  are  represented  by  their  unique  component  response 

patterns  of  absolute  value  and  sign,  these  rules  correspond  to  different 

ordered  pairs  of  (T3,  SECI4'1)  and  of  (Ts  SECI4S)  k=l...K.  If  each 

k  k  k  k 

cluster  of  the  erroneous  rules  could  be  separated  from  the  rest  of  the 
clusters  by  a  hyperplane  in  the  four  dimensional  rule  space  of  signed- 
number  subtraction  problems,  then  diagnosis  of  the  responses  resulting 
from  random  "slips"  around  a  rule  will  be  given  by  examining  in  which 
region  (divided  by  the  hyperplanes)  the  responses  fall.  This  approach 
often  is  called  "pattern  classification."  With  the  probabilistic 
approach  of  rule  space  and  pactern  classification  it  is  possible  to 
remedy  the  weakness  of  the  deterministic  approach  taken  in  SIGNBUG 
without  losing  s  strength. 

In  this  paper,  the  classification  boundaries  of  20  clusters 
neighboring  the  20  erroneous  rules  of  signed-number  subtraction  problems 
are  shown.  The  list  of  the  20  rules  plotted  in  Figure  4  and  their 
descriptions  are  given  elsewhere  (Tatsuoka  f<  Tatsuoka,  1981),  and  Figure 
5  shows  the  20  clusters  around  the  20  rules.  A  stepwise  discriminant 
analysis  (BMF)P7ci)  was  used  to  determine  the  classification  functions  and 
1  ihlc-  1  su  vn.,  r  i  ze  s  the  results. 


Insert  Figures  4  &  5  about  here 


Insert  Table  3  about  here 

Nineteen  of  the  rules  are  perfectly  classified,  without  any  error 
of  classification,  and  only  rule  (45)  has  one  out  of  the  31  samples 
misclassif ied.  Four  independent  variables  —  absolute  value,  and  sign  true 
scores,  and  SECI4  for  absolute  value  and  signs  —  were  used  in  the  analysis. 
Data  Analysis 

Changes  of  responses  over  time  for  individual  students.  The  40-item 
open-ended  test  for  subtraction  problems  of  signed  number  arithmetic  was 
administered  four  times  to  the  students  in  a  local  junior  high  school  in 
1981.  The  first  test  was  administered  before  instruction  on  the  subject 
was  given  to  the  eighth  graders  and  is  referred  to  as  "Test  3."  The 
instruction  (lessons)  were  written  on  the  computer-based  eduction  system 
(PLATO^)  at  the  University  of  Illinois.  The  lessons  are  each  almost  one 
hour  long.  Two  different  Instructional  methods  —  one  based  on  the 
number  line  and  the  other,  which  relies  heavily  on  verbal  ability,  using 
the  postman  stories  (Davis,  1964)  —  are  given  to  two  randomly  selected 
groups.  We  will  refer  to  the  number  line  group  as  Croup  1  and  the 
postman  group  as  Group  2  hereafter. 

The  second  test  (Test  4)  was  administered  after  the  students  completed  the 
two  PLATO  lessons.  Subsequently,  a  regular  class  teaching  subtraction 
skills  was  held.  The  teachers  adopted  a  method  using  verbal  rules 
(described  in  Birenbaum  &  Tatsuoka,  1980;  Tatsuoka,  1981)  and  drilled 
the  students  for  two  weeks.  Although  they  referred  to  the  number  line 
method  in  a  systematic  way,  they  did  not  mention  the  postman  stories  at 
all.  After  two  weeks  of  classroom  instruction,  the  third  test  (Test  5)  was 


Figure  5:  Twenty  Clusters  of  the  Responses  Neighboring  the 
Twenty  Rules 


administered.  The  fourth  test  (Test  6)  was  given  after  the  students  completed 
multiplication  and  division  of  signed  numbers.  Test  6  was  mentioned  earlier. 

Since  the  40-item  test  is  composed  of  four  parallel  subtests  of 
ten  items,  the  comparison  of  errors  committed  by  the  student  across  the 
four  subtests  can  be  carried  out  by  plotting  the  responses  to  the  ten 
items  four  times  in  the  rule  space  obtained  by  the  10-item  subtest. 

That  is,  the  student’s  responses  to  the  40  items  yield  4  points,  each  of 
which  corresponds  to  one  of  the  four  parallel  subtests.  Table  4  shows  the 
component  values  of  the  rules  space  for  two  students  A  and  B.  TA i ,  TA2, 

TA3  and  TA4  are  obtained  by  averaging  estimated  logistic  probabilities 
Pj(0A)  over  10  items  in  each  subtest, 

T4K  ^  £  Pj<9A>.  K*1'2- 

ECI4Ai_4  are  calculated  by  using  10  items  in  each  subtest.  Thus,  four 
sets  of  two  ordered  pairs  are  obtained  from  Student  A’s  responses  to  the 
four  parallel  subtests.  Note  that  the  cordinates  of  the  plots  in  Figure  1 
are  based  on  40  items,  but  the  coordinates  of  the  points  in  Figure  6  are 
obtained  from  10  items.  Student  A  studied  the  number  line  method  (Group 
1)  and  student  B  studied  the  postman  stories  (Group  2).  Their 
performances  on  the  four  subtests  are  shown  in  Table  4.  Interpretation 
of  the  changes  made  by  Students  A  and  B  across  the  four  subtests,  over 
the  four  different  stages  of  learning  designated  by  Tests  3  through  6 
are  summarized  in  Appendix  IV.  Their  rule  space  representations  are 
given  in  Figures  6  and  7. 

Insert  Table  4  about  here 
Insert  Figures  6  &  7  about  here 
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The  progress  shown  by  student  A  over  the  four  tests  is  normal 
because  his  average  points  obtained  on  the  four  subtests  (marked  by  "o") 
in  each  of  the  four  tests  (Tests  3  through  6)  gradually  moved  toward 
the  top-right  corner  of  Figure  6.  The  use  of  the  right  rule  is 
designated  by  the  point  (1,1)  at  the  top  right  corner.  Variation  of  the 
four  points  in  a  test  is  due  to  the  variability  of  responses  as  well  to 
sampling  errors.  Since  each  wave  of  subtests  consists  of  ten  carefully 
chosen  parallel  items,  if  performance  on  the  test  is  perfectly 
consistent  over  the  four  subsets,  then  the  estimated  parameter  should  be 
identical  for  4  parallel  items.  But  actual  data  used  for  estimating  the 
item  and  person  parameters  by  the  maximum  likelihood  procedure  of  the 
two  parameter  logistic  model  was  not  so.  Therefore,  the  four  points  of 
the  four  tests  do  not  coincide  perfectly  as  a  single  point.  For  example, 
student  A  produced  identical  responses  for  the  first  wave  of  subtests  in 
Tests  3  and  4.  Yet,  the  points  that  appeared  in  Figure  4  are  slightly 
different.  Also,  the  performances  on  the  second,  third  and  fourth 
subtests  of  the  first  test  are  identical,  committing  the  same 
erroneous  rules,  but  the  points  designating  these  responses  are  slightly 

different;  1?,  In,  ana  i  at>  siiown  in  the  f'Lgur  . 

~  4 

Student  B  mastered  the  compulaLioiu-i  .m.iU  LairLy  well  after  he 
studied  the  PLATO  lessons  (postman  stories).  After  the  class,  however, 
his  performance  was  affected  by  the  different  teaching  methods  and  he 
displayed  confusion  when  converting  subtraction  operations  into  addition 
operation.  The  postman-stories  approach  does  not  teach  the  steps  in 
converting  subtraction  to  addition  in  a  step-by-step  fashion  as  the 


teacher’s  verbal  rules  do.  Thus  his  errors  (diagnosed  by  our  computer 


program  SIGNBUG)  clearly  showed  that  he  did  carry  out  newly  converted 
addition  problems  correctly  (the  verbal  rule  is  first  to  change  the 
operation  sign  of  to  "+"  and  then  to  change  the  sign  of  the  second 
number),  but  incorrectly  converted  subtraction  into  addition  problems. 
Changes  of  responses  at  different  points  in  time 

Figures  8  through  11  are  plots  of  the  responses  made  by  all 
students  who  took  Tests  4  through  6.  The  coordinates  of  the  plots  in 
the  figures  are  the  true  scores  (Lord  and  Novick,  1968)  of  the  two 
components  scores,  absolute  values  and  signs.  The  trend  of  changes  in 
the  points  of  the  four  tests  is  clear:  As  the  stages  of  learning 
advance  toward  mastery  of  the  right  rule,  the  cluster  moves  toward  the 
right  top  corner  [(1,1)  represents  the  use  of  the  right  rule]  of  the 
space  spanned  by  two  component  true  scores.  The  points  from  Test  5  in 
Figure  9  cluster  most  closely  to  the  point  (1,1),  the  right  rule. 

Insert  Figures  8,  9,  10  &  11  about  here 


But  the  points  from  Test  6  in  Figure  11  are  no  longer  clustering  as 
closely  to  the  top  right  corner  of  the  space  as  the  points  of  Test  5 
are.  Learning  new  materials  (i.e.,  multiplication  and  division  of 
signed  numbers)  after  the  completion  of  the  subtraction  unit  affected 
the  performances  on  Test  6. 

Summary  and  Discussion 

This  study  introduced  a  probabilistic  model  utilizing  item  response 
theory  for  dealing  with  a  variety  of  misconceptions.  The  model  can  be 
used  for  evaluating  the  transition  behavior  of  error  types,  advancement 
of  learning  stages,  or  the  stability  and  persistence  of  particular 
misconceptions.  Moreover,  it  can  be  used  for  relating  the 


Plotting  of  All  Students  Participating  in  the  Experiments 
(Test  3,  Postman  Stories  "o"  and  Number  Line  Groups 


of  All  Students 
Postman  Stories 


Postman  Sto 


"behaviors"  of  errors  to  other  criterion  measures  such  as  creativity, 
anxiety  and  motivation. 

One  of  several  personal  Indices  based  on  item  response  theory  was 
used  to  formulate  "rule  space"  which  is  a  geometric  representation  of 
erroneous  rules  of  operation.  The  index  in  question,  ECI4,  which  is 
used  primarily  for  detecting  aberrant  response  patterns,  has  proved  to 
be  effective  for  separating  clusters  of  response  patterns  from  one 
another. 

Each  cluster  comprises  the  response  patterns  yielded  by  some  rule 
and  its  "slips"  —  due  to  partially  consistent  application  of  that  rule. 
The  model  enables  us  to  apply  pattern  classification  techniques  to 
distinguish  a  cluster  of  response  patterns  around  an  erroneous  rule  from 
other  clusters.  Thus,  the  probability  of  misclassif ication  should  be 
obtainable.  However,  rigorous  investigation  along  this  line  is  left  for 
subsequent  investigation. 

The  examples  in  this  study  only  suggest  how  the  rule  space  approach 
works  and  the  results  of  further  statistical  analyses  are  not  discussed. 


Subtraction  Problems:  Absolute  Value  Scoring 


Appendix  Ill 


Coordinates  of  tiie 

20  Erroneous 
Signed-Number 

Rules  in  the 
Subtraction 

Rule  Space  for 
Test 

a  40-item 

Rule 

Component 

True 

Scores 

SEC  I 

Component''’' 

Response 

Patterns 

0 

3 

absolute 

.  7084 

3.3315 

0101101011 

-.3550 

sign 

.7510 

1.3322 

0101111111 

-.5516 

6 

absolute 

.4761 

2.9877 

0100101001 

-.8889 

sign 

.6436 

-.1234 

0101111101 

-.3726 

7 

absolute 

.  3987 

6.3432 

0100001000 

-1.1116 

sign 

.5291 

-2.3331 

0101111100 

-1.1457 

1 1 

absolute 

.6837 

.6337 

1011010110 

-.4124 

sign 

.  7843 

-2.6399 

1011111110 

-.4322 

12 

absolute 

.  7522 

-2.6485 

1011110111 

-.2459 

sign 

.8791 

.2196 

1011111111 

.0117 

13 

abso lute 

.4488 

1.5525 

1010010100 

-.9617 

sign 

.6818 

-2.7285 

1011111100 

-.7733 

15 

absolute 

.6591 

3.0325 

1110011100 

-.4681 

sign 

.  7836 

-2.4379 

1111111100 

-.4349 

lh 

absolute 

.2966 

-3.1570 

1000110001 

-1.5288 

sign 

.4488 

-3.2816 

0001111100 

-1.4754 

17 

abso lute 

.  2966 

-3. 1570 

10001 10001 

-1.5288 

sign 

.6818 

-2.7285 

1011111100 

-.7733 

13 

abso lute 

.  2966 

-3.1570 

1000110001 

-1.5288 

s  ign 

.8791 

.2196 

1011111111 

.0117 

21 

abso lute 

.8338 

3.4064 

0111001110 

.0051 

s  ign 

.6584 

3.9372 

0101011011 

-.8436 

22 

absolute 

.2966 

-3.1570 

1000110001 

-1 . 5288 

sign 

.6584 

3.9372 

0101011011 

-.8436 

25 

abso lute 

.2006 

.8670 

0000100001 

-2.2077 

sign 

.5457 

-1.4002 

0001111101 

-1.1738 

30 

absolute 

.2966 

-3.1570 

1000110001 

-1.5288 

s  ign 

.  7664 

-1.5339 

0111111110 

-.4978 

32 

absolute 

.2966 

-3.1570 

1000110001 

-1.5288 

sign 

.4555 

2.4860 

1010100100 

-1  .4534 

34 

abso lute 

.4488 

1.5525 

1010010100 

-.9617 

sign 

.7664 

-1.5839 

0111111110 

-.4978 

37 

absolute 

.4891 

1.5936 

1001010010 

-.8558 

s  i  g  n 

.  3837 

.8349 

0100111000 

-1.7053 

38 

absolute 

.7084 

3.3315 

0101101011 

-.3550 

s  ign 

.4488 

-3.2816 

0001111100 

-1.4754 

4  3 

absol ute 

.  2966 

-3.1570 

10001 10001 

-1.5288 

sign 

.5186 

-1.7692 

1001101100 

-1.2549 

4  b 

abso lute 

.  7644 

.0833 

1101111011 

-.2133 

s  i  gn 

.  8727 

1.6241 

1101111111 

-.0265 

•'Since  tlu,  test  consists  of  four  parallel  tests  (i.e.,  eacls  task 
iias  four  parallel  items),  the  response  patterns  of  the  first 
Lon  items  re  given  here. 


Appendix  LV 


Stories  of  Student  A  and  li’s  performances  on 
the  Four  Different  Learning  Stages 

Student  A 


Pretest  (Test  3) :  tie  studied  the  number  line  method  written  the  PLATO 
system  for  about  an  hour  in  January,  1980  when  he  was  in  the  seventh 
grade.  fie  had  not  been  exposed  to  any  kind  of  instruction  related  to 
signed  numbers  before.  In  September  of  1980,  he  took  a  64-item  signed 
number  test  along  with  40  other  eighth  graders  before  a  revised  version 
of  the  number  line  lesson  on  the  PLATO  system  was  given.  His  diagnosed 
rules  are  as  follows: 


Subtest  1  —  His  rule  for  taking  an  absolute  value  in  the  answer 
was  to  always  subtract  the  smaller  number  from  the 
larger  number.  His  rule  for  taking  a  sign  in  the 
answer  was  to  take  the  sign  of  the  larger  number. 

However,  his  application  of  the  rule  is  only  consistent 
to  items  2,  4,  6,  7,  8,  9,  12,  13  and  16,  as  shown  in  Table  2 

Subtest  2  —  His  application  of  the  rule  described  above  became 
consistent  to  all  the  items  in  a  test. 


Subtest  3  —  His  rule  was  the  same  as  that  used  in  Subtest  2 


Subtest  4  —  His  rule  was  the  same  as  that  used  in  Subtest  2 


The  test  after  the  PLATO  lesson  (1  hour)  was  given  (Test  4): 

Test  4  was  administered  to  Student  A  after  he  studied  a  number  line  lesson 
on  the  PLATO  system. 


Subtest  1  — 


Subtest  2  — 

Subtest  3  — 


He  still  subtracted  the  smaller  number  from  the  larger 
number  and  took  the  sign  of  the  larger  number  for  the 
items  described  in  Subtest  1  of  the  pretest  (items  2,  4,  6,  7 
8,  9,  12,  13  and  16). 

He  used  basically  the  same  rule  but  applied  it  to  different 
subsets  of  items  for  both  the  sign  and  the  absolute  value 
operations . 

Hd  suddenly  changed  his  rule  to  a  new  one.  If  the 

first  number  was  smaller  in  absolute  value,  he  subtracted 

the  smaller  number  from  the  larger  .  or.  If  the  first 

number  was  larger  in  absolute  value,  then  he  added  the  two 

numbers.  He  used  this  rule  for  8  items  (all  except  for 

the  L-S  and  S-L  types).  His  performance  of  the  sign  operation 

was  inconsistent  and  undetermined. 

Hi s  rule  changed  again.  This  time  he  changed  the 
operation  sign  to  "+"  and  applied  the  right  addition 

rule  to  items  having  explicit  signs  in  the  second  number. 


Subtest  4 


Classroom  instruction  started  with  an  explanation  of  the  concept  of  the 
number  line.  After  students  mastered  the  addition  skilis  based  on  the 
number  line  method,  the  teachers  switched  their  instruction  to  the  use 
of  verbal  rules  (Birenbaura  &  Tatsuoka,  1980;  Chaiklin,  1982). 

Therefore,  subtraction  problems  are  taught  by  the  use  of  the  verbal 
rules. 

Subtest  1  —  lie  learned  to  use  the  right  rule. 

Subtest  2  —  He  applied  the  right  rule  consistently  for  taking  the  absolute 

value  to  the  items  having  explicit  signs  in  the  second  numbers 
He  used  the  correct  sign  for  all  items. 

Subtest  3  —  In  the  second  subtest,  he  used  the  right  rule  consistently 
when  taking  the  signs  in  the  answers  for  all  items. 

Taking  the  absolute  values  was  done  correctly  for  a  subset 
of  the  items  (items  with  parentheses  and  L-S,  S-L  types). 

Subtest  4  —  He  used  the  right  rule  for  all  items. 


The  test  after  2  weeks  of  classroom  instruction  was  completed  (Test  6): 


Subtest  1  —  He  used  the  right  rule  for  all  items  except  if  12. 

Subtest  2  —  He  applied  the  right  rule  successfully  to  all  the  items. 

Subtest  3  —  The  result  was  the  same  as  in  Subtest  2. 


Subtest  4  — 


He  used  the  right  rule  for  a  subset  of  items,  except 
items  50  and  56. 


Student  A’s  performances  are  plotted  into  Figure  6. 

Student  B 

Pretest  (Test  3):  She  studied  postman  stories  written  on  the  PLATO 
system  for  about  an  hour  in  January  1980.  She  had  not  been  exposed  to 
any  kind  of  instruction  related  to  signed  numbers.  At  the  beginning  of 
the  1980-81  fiscal  year  she  took  a  64-item  signed-nuinber  test  along  with 
40  other  eighth  graders  before  the  revised  version  of  postman  stories 
was  given  to  her  class.  Her  diagnosed  rules  are  as  follows: 


Subtest  l  — 


Her  rule  for  taking  an  absolute  value  in  the  answer  was 
undetermined.  The  signs  of  her  answers  for  the  items 
which  don't  have  parentheses  are  yielded  by  the 
right  rule.  The  random  nature  of  her  answers  suggest  sh 
was  not  sure  what  she  should  do  with  the  parentheses. 


Subtest  2 


Her  rule  for  taking  an  absolute  value  In  the  answer  was 
undetermined.  She  used  the  right  rule  for  Items 
whose  signs  of  the  second  number  were  not  explicitly 
written  (hidden  sign). 

Subtest  3  —  Her  rule  for  taking  an  absolute  value  in  the  answer  was 

again  undetermined.  For  the  sign  part,  she  changed  her  rul 
and  completed  the  subtraction  operation  by  di s rega rd i ug 
the  step  of  changing  the  sign  of  the  second  number. 

Thus,  she  changed  the  sign  of  the  operation  to 

"+”  and  applied  the  right  rule  for  addition  problems 
to  the  newly  converted  addition  problems.  However,  her 
rule  was  not  applied  consistently  for  the  items  having 
hidden  signs  in  the  second  number. 

Subtest  4  —  Her  performance  was  identical  to  her  performance 
on  Subtest  3. 

The  test  given  after  l  hour  PLATO  _lesson  was  studied  (Test  4): 

Subtest  l  —  She  applied  the  right  rule  to  the  items  with  parentheses 
and  L-S,  S-L  types. 

Subtest  2  —  She  used  the  right  rule  for  taking  an  absolute  value  and 
obtained  the  right  answers  for  the  items  with  the 
parentheses  and  L-S,  S-L  types.  But  the  rule  for  taking 
signs  to  the  answers  was  not  consistent  so  the  rule  was 
undertermined. 

Subtest  3  —  She  answered  all  10  items  with  the  right  rule. 

Subtest  4  —  She  used  the  right  rule  for  nine  items  except  for  -L  -  (-S) 

type.  Her  error  is  due  to  mistyping  a  sign  in 
the  answer. 


The  test  after  the  classroom  instruction  (2  weeks)  was  given  (Test  5): 


Subtest  1  —  Her  rule  regressed.  She  did  not  change  the  sign  of  the 

second  number  at  all.  Instead,  she  changed  the  operation 
sign  "— "  to  “+"  and  consistently  applied  the  right  rule 
for  items  having  explicit  signs  in  the  second  number, 
second  numbers. 


Subtest  2  —  She  used  the  same  rule  described  above.  But  the  rule  of 
taking  an  absolute  value  in  the  answer  was  not 
consistent  but  her  sign  operation  was  consistent  for  the 
items  whose  signs  were  explicit  in  the  second  number. 

Subtest  4  —  Her  performance  was  identical  to  the  performances  on 
Subtest  3. 

The^  test  given  after  multiplication  and  division  of  signed  numbers  (Test  b 
She  applied  the  right  rule  repeatedly  over  the  four  .subtests  and 
answered  correctly  for  all  40  items  in  the  test.  This  student’s 
performance  is  shown  in  Figure  7. 
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